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ABSTRACT 

A method is presented for automated photometric classification of supernovae (SNe) as Type- 
la or non-la. A two-step approach is adopted in which: (i) the SN lightcurve flux measure- 
ments in each observing filter are fitted separately to an analytical parameterised function that 
is sufficiently flexible to accommodate vitrually all types of SNe; and (ii) the fitted function 
parameters and their associated uncertainties, along with the number of flux measurements, 
the maximum-likelihood value of the fit and Bayesian evidence for the model, are used as 
the input feature vector to a classification neural network that outputs the probability that 
the SN under consideration is of Type-la. The method is trained and tested using data re- 
leased following the SuperNova Photometric Classification Challenge (SNPCC), consisting 
of lightcurves for 20,895 SNe in total. We consider several random divisions of the data into 
training and testing sets: for instance, for our sample V\ (Vn), a total of 10 (40) per cent of 
the data are involved in training the algorithm and the remainder used for blind testing of the 
resulting classifier; we make no selection cuts. Assigning a canonical threshold probability of 
Pth = 0.5 on the network output to class a SN as Type-la, for the sample T>\ (V4) we ob- 
tain a completeness of 0.78 (0.82), purity of 0.77 (0.82), and SNPCC figure-of-merit of 0.41 
(0.50). Including the SN host-galaxy redshift and its uncertainty as additional inputs to the 
classification network results in a modest 5-10 per cent increase in these values. We find that 
the quality of the classification does not vary significantly with SN redshift. Moreover, our 
probabilistic classification method allows one to calculate the expected completeness, purity 
and figure-of-merit (or other measures of classification quality) as a function of the thresh- 
old probability p t h, without knowing the true classes of the SNe in the testing sample, as is 
the case in the classification of real SNe data. The method may thus be improved further by 
optimising p t h and can easily be extended to divide non-la SNe into their different classes. 
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1 INTRODUCTION 

Much interest in supernovae (SNe) over the last decade has been 
focussed on Type-la (SNIa) for their use as 'standardizable' can- 
dles in constraining cosmological models. Indeed, observations of 
SNIa l ed to the discovery of the accelerated exp ansion of the uni- 
verse dRiess et ail 1 199 8: Perlmut ter et alj[l999t) . which is usually 
interpreted as evidence for the existence of an exotic dark en- 
ergy component. Ongoing observations of large samples of SNIa 
are being used to improve the measurement of luminosity dis- 
tance as a function of re dshift, and thereby constrain cosmological 
parameters further (e.g.. | Kessler et alj|2 009: Benitez-Herrera et al. 
120121 : ISullivan et al.ll201ll : IConlev et al.ll201 ll : iMarch et alj|201 lh 
and improve our knowledge of dark energy (e.g., iMantz et al.l 
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120101 : lBlakeetalj|20lTh . Moreover, the gravitational lensing of 
SNIa by foreground cosmic structure along their li nes-of-sight has 
been used to constrain cosmological parameters jMetcali|[T999l 
iDodelson & Vallinottd [20061, IZentner & Bhattacharva 2009) and 
the p r operties of the lensin g mat ter ( Rauch 1 1 99 ll iMe tcalf & Silk 
Ijonsson et alj 20071 Kronborg et alj I2OIOL llonsson et al" 



1999, 



2010al llonsson et alj|2oTobl 



Karpenka et al. 2012). In addition to 



their central role in cosmology, the astrophysics of SNIa is also of 
interest in its own right, and much progres s has been made in under- 
standing these objects recent years (e.g. Irlillebrandt & Niemeverl 
l200d) . 

Other types of SNe are also of cosmological interest. Type II 
Plateau Supernovae (SNII-P), for example, can also be used as dis- 
tance indicators, although only for smaller distances and to lower 
accuracy than SNIa. Compared to SNIa, however, for which there 
is still uncertainty regarding the progenitor system, SNII-P explo- 
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sions are better understood. Furthermore, since SNII-P have only 
been found in late-type galaxies, biasses from environmental ef- 
fects will most probably have a smaller effect on distance measure- 
ments using SNII-P. Thus, the differences between the two types 
of SNe will result in different syst ematic effects, allowin g SNII-P 
data to complement SNIa analyses jD' Andrea et alj2010h . 

Although not used directly in cosmology, other classes of SNe 
are a potential source of contamination when attempting to com- 
pile SNIa catalogues, most notably SN Ib/c. The consequenc es of 
such contamination have been considered bv lHomeiedl2005l) . SN 
Ib/c are also of considerabl e astrophysical in terest, in particular the 
nature of their progenitors l lFrver et al J 20071) . 

The next generati on of survey telescopes, such as the Dark En- 
ergy S urvey (PES ; Wester & Park Energy Survey Collaboration! 
l2005t lAnnisetalJ l201lh. the Large Synop tic Survey Tele- 
scope (LSST:jTvsonl 12002, llvezic et al.| 120081) . and SkyMapper 
( Schmidt et alj200a^ are expected to observe lightcurves for many 
thousands of SNe, far surpassing the resources available to confirm 
the type of each of them spectroscopically. Hence, in order to take 
advantage of this large amount of SNe data, it is necessary to de- 
velop methods that can accurately and automatically classify many 
SNe based only on their photometric light curves. 

In response to this need, many techniques targeted at 
SNe photometric classification have be en developed, mostly 
based on some form of t emplate fitting iPoznanski et alj 20021; 
Johnson & Crottsl 120061: ISullivan etal] 120061: IPoznanski et alJ 
20071 : iKuznetsova & Connolly||2007l: iKunz et al ll2007l: ISako et al l 
200a Sako et alj 1201 lllRodnev & Tonrvl 120091: iGong et al.1 l20ld: 



Falck et al.ll2oToR . In such methods, the lightcurves in different fil 



ters for the SN under consideration are compared with those from 
SNe whose types are well establised. Usually, composite templates 
are constructed for each class, using the observed lig htcurves of a 
numb er of well-studied, high signal-to-noise SNe (see Nug ent et al.l 
2002), or spectral energy distribution models of SNe. Such meth- 
ods can produce good results, but the final classification rates are 
very sensitive to the characteris tics of the temp lates used. 

To address this difficulty, Newling et aj] d201 lh instead fit a 
parametrised functional form to the SN lightcurves. These post- 
processed data are then used in either a kernel density estimation 
method or a 'boosting' machine learning algorithm to assign a 



probability to each classification output , rather than simply as sign- 
ing a specific SN t ype. M ore recently, iRichards et alj j2012h and 
Ishida & d e Souzal d2012h have introduced methods for SN photo- 
metric classification that do not rely on any form of template fitting, 
but instead employ a mixture of dimensional reduct ion of the SNe 
data c oupled with a machine learning algorithm. Richar dT et al.l 
d20 12h proposed a method that uses semi-supervised learning on 
a database of SNe: as a first step they use all of the lightcurves in 
the database simultaneously to estimate a low-dimensional repre- 
sentation of each SN, and then they employ a set of spectroscop- 
ically confirmed examples to build a classification model in this 
reduced space, which is subseque ntly used to estimate the ty pe of 
each unknown SN. Subsequently, llshida & de Souzal ( 120 121) pro- 
posed the use of Kernel Principal Component Analysis as a tool to 
find a suitable low-dimensional representation of SNe lightcurves. 
In constructing this representation, only a spectroscopically con- 
firmed sample of SNe is used. Each unlabeled lightcurve is then 
projected into this space and a fc-nearest neighbour algorithm per- 
forms the classification. 

In this paper, we present a new method for performing SN 
classification that also does not rely on template fitting in a con- 
ventional sense, but combines parametrised functional fitting of the 



SN lightcurves together with a machine learning algorithm. Our 
method is very straightforward, and might reasonably even be de- 
scribed as naive, but nonetheless yields accurate and robust classi- 
fications. 

The outline of this paper is as follows. In Sec. [2] we describe 
the data set used for training and testing in our analysis, and we 
present a detailed account of our methodology in Sec. [3] We test 
the performance of our approach in Sec.|4]by applying to the data 
and present our results. Finally, we conclude in Sec. [5] 



2 POST-SNPCC SIMLUATED DATA SET 

The data set we will use for training and testing our classifi- 
cation algorithm described below is that released by the Super- 
Nova Photometric Classification Challenge (SNPCC), which con- 
sists of simulations of lightcurves in several filters for 20,895 SNe 

i it — ' — ~i 

(Kessler et al. 2010). The simulations were made using the SNANA 
package JKessler et al.ll2009l) according to PES specifications. We 
used the updated version of the simulated data-set (post-SNPCC), 
which was made public after the challenge results were released. 
This updated data-set is quite different from the one used in the 
challenge itself, owing to some bug fixes and other improvements 
aimed at a more realistic simulation of the data expected for PES. 
The data set contains SNe of Types la, Ib/c, lb, Ic, Iln, II-P and II-L 
in the approximate proportions 25, 1, 7, 5, 9, 51 and 2 per cent, 
respectively. 

This large data set, which we denote by T>, is made up of two 
simulated subsamples: a small 'spectroscopically-confirmed' sam- 
ple of 1,103 SNe, which we denote by <S, and a 'photometric' sam- 
ple of 19,792 SNe, denoted by V. The S subsample consists of sim- 
ulated lightcurves for a set of SNe that one could follow-up with a 
fixed amount of spectroscopic resources on each of a 4-m and 8-m 
class telescope. The magnitude limits were assumed to be 21.5 (r 
band) for the 4-m and 23.5 (i band) for the 8-m telescope. Since 
spectroscopy is more demanding than photometry, S consists of 
SNe that on average have higher observed brightnesses and much 
lower host-galaxy redshifts than those in the photometric sample 
V. Consequently, S is not a random subset of T>, but instead has a 
much higher fraction of SNIa. 

For each one of the SNe in T>, we denote the data by Di = 
{t" k , F" k , a" fc }, where i indexes the SNe, a g {g, r, i, z} denotes 
the filter, k = 1, 2, ■ • ■ , nf indexes the number of flux measure- 
ments in filter a, t" k is the time for the given measurements, F" h is 
the flux measured at time tf k and af k is the corresponding uncer- 
tainty. The lightcurve in each filter for each SN is measured on an 
irregular time grid that differs from filter to filter and between SNe. 
For each SN, between 16 to 160 measurements were made (be- 
tween 4 and 40 measurements in each filter), with a median value 
of 101 measurements. We note that some of the SNe lightcurves 
were observed only before or after the peak in emission, but were 
still included in our analysis. 



3 ANALYSIS METHODOLOGY 

In order to perform the SNe classification, we adopt a two-step pro- 
cess: 

(i) SNe lightcurves are first fitted to an analytical parameterised 
function, in order to standardise the number of input parame- 
ters for each SNe that are used in NN training. Provided that the 
function is flexible enough to fit important features in typical SN 



Automated photometric classification of super novae 3 



lightcurves, but sufficiently restrictive not to allow unreasonable fit- 
ted lightcurves, the resulting representation of the data reduces the 
chances of overfitting by NN or any other machine learning classi- 
fication algorithm. Functional fitting also circumvents the problem 
associated with flux measurements being made on an irregular time 
grid. Details of the function fitting are given in Sec. 13. II 

(ii) The function parameters and their associated errors obtained 
by fitting the analytical form to the SNe lightcurves, along with the 
number of flux measurements, the maximum-likelihood value of 
the fit and Bayesian evidence of the model are then used as input 
parameters to a classification NN that outputs the probability that 
the SN under consideration is of Type la. Details of NN training are 
given in Sec. 13.21 

3.1 Lightcurve fitting 

All lightcurves were fitted with the followi ng parameterised func- 
tional form, which i s based on that used in Baz in et all J2009h and 
iKessleret all bOld) : 

„ p-(t-to)/T fa ii 

f(t) = A[l + B(t-t 1 f} 1 + eit _ toyTtiee . (1) 

While this form has no particular physical motivation, it is suf- 
ficiently general to fit the shape of virtually all types of SNe 
lightcurves, including th ose with double peak s (in contrast to the 
fitting function used by iNewlinget all 1201 lh . A separate fit was 
performed in each filter for each SN. In performing the fit, we as- 
sume the simple Gaussian likelihood function 

£(©) =exp[-ix 2 (0)], (2) 
where the parameter vector = {A, B, ti, to, T r i S e, Tf a ii} and 

X 2 (Q) = E [Ffc '^ fc;Q)]2 , (3) 

fc=i k 

in which n is the number of flux measurements for the super- 
nova/filter combination under consideration. The priors assumed 
on the function parameters are listed in Table[T] 

In order to estim ate the function parameters, we use the 
MultiNest package jFeroz & Hobsonl 120081 : iFeroz et al]|2009h 
which is built on the fr amework of nested sampling dSkilling|2004l ; 
ISivia & Skillin3l2006h and is very efficient at exploring posteriors 
that may contain multiple modes and/or large (curving) degenera- 
cies, and also calculates the Bayesian log-evidence, log Z, for the 
model. Fig. [Tj shows the data and corresponding fitted functional 
form in each filter for a typical Type la and Type II SN, respec- 
tively. One can see that the form (0 is indeed sufficiently flexible 
to provide a good fit to all the lightcurves. 

From each fit, the feature vector consisting of the mean val- 
ues © = {A, B, t\, to, Trise,, Tfaii,} and standard deviations <x = 
{<ta, ob, 0ti , Ct 1 °T riBO , 0"r faU } of the one-dimensional marginal- 
ized posterior distributions of function parameters, along with the 
number of flux measurements n, the maximum-likelihood value 
and the Bayesian evidence, are then used as inputs for NN training. 
For each SN, the total input vector consists of the concatenation of 
the feature vectors for each filter. Since there are 15 values in the 
feature vector for each filter, resulting in a total of 60 values across 
all 4 filters, the function fitting corresponds to a dimensionality re- 
duction relative to the number of flux measurements for some SNe, 
but not for others. One facet of the robustness of our approach is 
that the same function fitting process is performed for all SNe, irre- 
spective of the number and times relative to peak brightness of the 
flux measurements in each filter. 
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W(10- 5 ,100) on log A 
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W(l(r 5 ,100) on log B 
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U(0, 100) MJD 


to 


U(Q, 100) MJD 


T ■ 
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W(0, 100) MJD 




U(0, 100) MJD 



Table 1. Priors on lightcurve parameters, where U(a, b) denotes a uniform 
distribution between the limits a and b. 

Hidden 




Figure 2. A 3-layer neural network with 3 inputs, 4 hidden nodes, and 2 
outputs. Image courtesy of Wikimedia Commons. 



3.2 Neural network training 

A multilayer perceptron artificial neural network is the simplest 
type of network and consists of ordered layers of perceptron nodes 
that pass scalar values from one layer to the next. The perceptron 
is the simplest kind of node, and maps an input vector x £ 3t™ to a 
scalar output /(x; w, 8) via 

n 

/(x;w,0) = + X>^ (4) 

i=i 

where {wi} and 6 are the parameters of the perceptron, called the 
'weights' and 'bias', respectively. We will focus mainly on 3-layer 
NNs, which consist of an input layer, a hidden layer, and an output 
layer as shown in Figure[2] The outputs of the nodes in the hidden 
and output layers are given by the following equations: 

hidden layer: h 3 = ) ); ff ] = 9^ + wfxt, (5) 

i 

output layer: y t = g <2) (/ 8 (2) ) ; /< 2) = 0< 2 > + £ h 3 , (6) 

i 

where I runs over input nodes, j runs over hidden nodes, and i runs 
over output nodes. The functions g™' and p' 2 ' are called activation 
functions and must be bounded, smooth, and monotonic for our 
purposes. We use g^ix) = tanh(x) and g i - 2 \x) = x; the non- 
linearity of g^ is essential to allowing the network to model non- 
linear functions. 

The weights and biasses are the values we wish to determine 
in our training. As they vary, a huge range of non-linear mappings 
from inputs to outputs is possible. In fact, a 'universal approxima- 
tion theorem' lHorni ketalJll990h states that a NN with three or 
more layers can approximate any continuous function as long as 
the activation function is locally bounded, piecewise continuous, 
and not a polynomial. 

In order to classify a set of data using a NN, we need to pro- 
vide a set of training data T> = {x' 1 ' , t^*' }, where denotes the 
input parameters and t' 1 ' is a vector with membership probabilities 
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Figure 1. Simulated lightcurve measurements and associated uncertainties (red points) in the g, r, i and z filters for a Type-la (top row) and non-la (bottom 
row) SN, together with the best-fit function (green line) of the form in equation (T}. 



for each class. The likelihood function for a classification network 
is given by 

nt n c 

£ ( a ) = EE^ )lo g^( x(l) ; a )> ( ? ) 

i=i i=i 

where n t is number of training data, n c is number of classes, a 
denotes the network weights and biasses and pj is probability pre- 
dicted by NN for class j. These probabilities are calculated using 
the soft-max function applied to outputs from the final layer of the 
NN: 



Di' e J 

Depending on the network architecture, there can be mil- 
lions of network weights and biasses which makes network train- 
ing a very complicated and computationally challe nging task. Stan- 
dard methods use the gradient descent algorithm dRumelhart et al.l 
1986) for network training but this does not work well for very 
deep networks (networks with many hidden layers). We therefore 
use the SkyNet package for network training which uses a 2nd- 
order optimisation method based on the conjugate gradient algo- 
rithm and has been shown to train very deep networks efficiently 
(Feroz et al., in preparation). SkyNet has als o been combined with 
MultiNest in the BAMBI jGraff et al.l2012l) package for fast and 
robust Bayesian analysis. 

3.3 Application to post-SNPCC data 

In the original SNPCC, participants were given the 
'spectroscopically-confirmed' sample S and asked to predict 
the type of SNe in the 'photometric' sample V. We apply our 
method to this case in Sec. [4] but, as we comment in Sec. [2] S is 
not a random subset of T>, and so is no t repre sent ative of the full 
data set. As discussed in iNewling et al] < T201ll) and lRichards et all 
(2012), when training machine-learning methods - including neu- 
ral networks - the distribution and characteristics of the training 
and testing samples (see below) should be as similar as possible. It 
can therefore cause difficulties to use S for training and V as the 
testing sample, since <S constitutes a biassed subsample. 

The original rationale for using S as the training data in the 
SNPCC was that limited spectroscopic resources in future surveys 



are likely to produce a biassed sample of SNe of known class. 
Nonetheless, studies of automated methods for photometric clas- 
sification of SNe may provide su fficient motivation to modify the 
spectroscopic follow-up strategy ^Richards et aT1l2012h . leading to 
real spectroscopically-confirmed SNe training samples that more 
closel y represent t he larger photometric sample. Therefore, follow- 
ing jNewhng^tal] feOllh . after deciding on the proportions of the 
data to be used for training and testing, we also consider partition- 
ing T> randomly. 

In fact, in each case, we divide the data among three dif- 
ferent categories: optimisation, validation and testing. Data in the 
optimisation category constitute the n t examples on which net- 
work weights are optimised using the likelihood function given in 
Eq. Q. Data in the validation category are used to guard against 
overfitting: when the sum of squared errors on validation data starts 
to increase, the network optimisation is stopped. The combination 
of optimisation and validation data, always in the relative propor- 
tions 75:25 per cent, constitute what we call our 'training' data. 
Data in the testing category are not involved in the training of the 
network at all, and are used to assess the accuracy of the resulting 
classifier. All the results presented in this paper are obtained from 
the testing data-set. 

It is interesting to check the improvement in network predic- 
tions with the amount of data used in training, we therefore con- 
struct six random training samples T> v (p — 0, 1, 2, . . . , 5), which 
contain 5, 10, 20, 30, 40 and 50 per cent of the data, respectively, 
and the use remainder of the data for testing. Sample Do con- 
tains 1045 SNe and is thus similar in size to the spectroscopically- 
confirmed sample S discussed above. Clearly, as the amount of data 
used for training increases, one should expect the accuracies of pre- 
dictions coming from networks to improve. 

For each sample, we use the following input set for network 
training: x w = {©" , of ,nf , (log £ max )f , log Zf}, where i in- 
dexes the SNe allocated to a given optimisation, validation and test- 
ing category, a £ {g, r, i, z} denotes the filter, ©" and of are the 
means and standard deviations respectively of function parameters 
defined in Sec. 13.11 nf is the number of flux measurements for a 
given SN, (log/Imax)" is the maximum-likelihood value of the fit 
and log Z" is the Bayesian log-evidence. Moreover, we also train 
further networks with redshift Zi and its uncertainty a Zi included as 
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Table 2. Completeness, purity and figure of merit for Type-la SNe classifi- 
cation obtained by applying trained networks to the various testing data-sets 
with threshold probability p th set to 0.5. 



additional inputs, in order to determine whether they can improve 
the SNe classification. 



4 RESULTS 

We use a 3-layered perceptron neural network with 500 nodes in 
the hidden layer. Once the network has been trained, it is applied 
to the testing data-set to obtain the predictions for each SN therein 
being either Type-la or non-la. To perform this classification, we 
first need to pick a threshold probability pth such that all SNe for 
which the network output probability of being Type-la is larger than 
Pth are identified as Type-la candidates. One can then calculate the 
completeness ei a (fraction of all type la SNe that have been cor- 
rectly classified; also often called the efficiency), purity ri a (fraction 
of all Type la candidates that have been classified correctly) and fig- 
ure of merit Fin for Type la SNe. These quantities are defined as 
follows: 



^ytotal ' 



Tta 



-Fla 



jytruc _|_ jy"false ' 



jytotal jytruc , \YNi ah 
la la la, 



(9) 
(10) 
(11) 



where Ni° t€L is the total number of Type la SNe in the sample, 
Nil uc is the number of SNe correctly predicted to be of Type la, 
iVi f a lsc is the number of SNe incorrectly predicted to be of Type la 
and W is a penalty factor which controls the relative penalty for 
false positives over false negatives. For SNPCC W = 3. 

Our classification results using a canonical threshold probabil- 
ity Pth = 0.5 are summarised in Tab. [2] It can be clearly seen that 
there is a significant improvement in the results as more training 
data are used. Moreover, comparing the results for training samples 
S and X>o, which are similar in size, one sees that the overall quality 
of the classification is much higher for the random, representative 
sample T>p than for the biass ed sa mple S ; this agrees w ith the find- 
ings of iNewling et alj d201ll) and lRichards et alj J2012b . Nonethe- 
less, although the purity obtained for the sample <S is quite low, the 
completeness is very high. This is most likely because of the bi- 



assed nature of 5, in which about 51 per cent of SNe are Type-la, 
whereas V contains only 22 per cent Type-la. Thus, the classifier 
has not been trained with a representative collection of non-la SNe, 
and hence often misclassifies them as Type-la. It is worth noting 
that the 'toy classifier', which simply classifies all SNe in V as 
Type la, would yield ei a = 1.00, ri a = 0.22 and Ji a = 0.09. 

Inclusion of redshift information results in a modest 5-10 per 
cent improvement in ei a , ri a and J-"i a for each random sample T> p , 
but only a very marginal improvement for S. In Fig. [3] we show 
the completeness, purity and figure of merit for Type la classifica- 
tion as function of SNe redshift z for the training samples T>\ and 
T>4. Apart from at very low z, where the results are slightly worse 
mainly due to there being fewer SNe, no clear trend with redshift is 
apparent. 

An important feature of probabilistic classification is that it al- 
lows one to investigate the quality of the classification as a function 
of the threshold probability. Moreover, as pointed out in lFeroz et al.1 
(2008), it also allows one to calculate the expected completeness, 
purity and figure of merit as a function of p t h, without knowing the 
true classes of the SNe in the testing sample (as is the case in clas- 
sification of real SNe data, in the absence of spectroscopic follow- 
up observations). Equally, these expected values can be calculated 
without the need to perform an average explicitly over realisations 
of the testing sample. 

Let us assume that the predicted probabilities for each SN be- 
ing of Type la are given by pi a ,i. The expected values of the total 
number N{° tal of Type la SNe in the sample, the number iV^" of 
SNe correctly predicted to be of Type la, and the number iVj f a Ise of 
SNe incorrectly predicted to be of Type la can then be calculated 
as follows: 



^total 



jCrf&lsC 
"la 



N 

y^ffla,i, 

! = 1 

N 

! = 1 .Pla,i>Pth 
N 

^2 1-Pla,i, 
i=l,Pla,i>Pth 



(12) 
(13) 
(14) 



where N is the total number of SNe classified and pth is the thresh- 
old probability. We can then use Eqs. ilO\ , i ll It and Jilt to calcu- 
lated the expected values of completeness ei a , purity fi a and figure 
of merit fj & as a function of threshold probability p t h- 

In Fig. [4] we plot the actual and expected values of complete- 
ness and purity for the samples T>\ and T>4, trained without redshift 
information. Plots for networks trained with redshift information 
are quite similar and therefore we do not show them. One sees that 
the expected completeness and purity curves match quite well with 
the corresponding actual ones. Thus, in principle, rather than ar- 
bitrarily choosing the value p t h = 0.5 (say), which was used to 
produce the results in Tableland Fig. [3] one could instead choose 
a value of pth designed to achieve a target overall completeness 
and/or purity for a given survey. 

We also plot the actual and ex pected Receive r Operating Char- 
acteristic (ROC) curves (see e.g. lFawcett| [2006) for our analysis 
procedure in Fig. [4] The ROC curve provides a very reliable way 
of selecting the optimal algorithm in signal detection theory. We 
employ the ROC curve here to analyse our SNe classification crite- 
rion, based on the threshold probability p t h- The ROC curve plots 
the True Positive Rate (TPR) against the False Positive Rate (FPR) 
as a function of the threshold probability. TPR is, in fact, identical 
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Figure 3. Completeness, purity and figure of merit for Type-la SNe classification as a function of redshift (z) from applying trained networks with a threshold 
probability (p t h) of 0.5; 'with ^'/'without z' indicates that redshift information was/was not used in network training. Samples T>\ and T>4 use 10 and 40 per 
cent of the data, respectively, for training. 
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Figure 4. True and expected Receiver Operating Characteristic (ROC), completeness and purity as a function of threshold probability p t i r from applying 
trained networks to get Ia/non-Ia classification probabilities on the testing data-set. No redshift information was used in network training. Samples T>± and T>4 
use 10 and 40 per cent of the data, respectively, for training. 



to completeness (and also equals the 'power' of the classification 
test in a Neyman-Pearson sense); it may also be defined as the ratio 
of the number of true positives for a given p t h to the number of true 
positives for p t h = 0. Conversely, FPR is the ratio of the number 
of false positives for a given p t h to the number of false positives for 
Pth = 0, which is also often referred to as the contamination (or 
the Neyman-Pearson type-I error ratefl. A perfect binary classifi- 
cation method would yield a ROC curve in the form of a right-angle 
connecting the points (0, 0) and (1, 0) in the ROC space via the up- 
per left corner (0, 1). A completely random classifier would yield 
a diagonal line connecting (0, 0) and (1,0) directly. 

One sees from the right-hand panel in Fig.|4]that our method 
yields very reasonable ROC curves, indicating that the classifiers 
are quite discriminative. One also sees that our expected ROC 
curves match well with the corresponding actual ones. Thus, in 
principle, one could 'optimise' the classifier by choosing p t h in a 
number of possible ways. For example, from among the numerous 
possibilities, one could choose p t h such that it corresponds to the 
point on the expected ROC curve where, either: (i) the ROC curve 
crosses the straight line connecting the (0, 1) and (1, 0) in the ROC 
space; or (ii) the straight line joining the point (1,0) to the ROC 
curve intersects it at right-angles. 



1 It is worth noting that, in terms of conditional probabilities, completeness 
is simply Pr(classified as Ia|Ia), purity is its Bayes' theorem complement 
Pr(Ia|classified as la), and contamination is Pr(classified as Ia|non-Ia). 



5 DISCUSSION AND CONCLUSIONS 

We have presented a new method for performing automated pho- 
tometric of SNe into Type-la and non-la. In our a two-stage ap- 
proach, the SNe lightcurves are first fitted to an analytic parame- 
terised function, and the resulting parameters, together with a few 
further statistics associated with the fit, are then used as the in- 
put feature vector to a classification neural network whose output 
is the probability that the SN is of Type-la. Assuming a canonical 
threshold output probability p t h = 0.5, when we train the method 
using a random sample of 10 (40) per cent of the updated simulated 
data set released following the SuperNova Photometric Classifica- 
tion Challenge (post-SNPCC), making no selection cuts, we find 
that it yields robust classification results, namely a completeness of 
0.78 (0.82), purity of 0.77 (0.82), and SNPCC figure-of-merit of 
0.41 (0.50). A modest 5-10 per cent improvement in these results 
is achieved by also including the SN host-galaxy redshift and its 
uncertainty as inputs to the classification network. The quality of 
the classification does not depend strongly on the SN redshift. 

It is difficult to perform a direct comparison of our results with 
those submitted to the original SuperNov a Photometric Classifica- 
tion Challenge, which are summarised in iKessler et alj d2010h . As 
pointed out in that paper, the original challenge data set suffered 
from a number of bugs; these were subsequently corrected before 
the release of the post-SNPCC data set used in this paper. The lat- 
ter also benefited from further improvement in the generation of the 
simulations, leading to more realistic SNe lightcurves. It is hard to 
assess how these differences affect the difficulty of classifying the 
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SNe. For example, in the original SNPCC data set, non-la SNe were 
too dim on average, which made classifcation of Type-la SNe eas- 
ier. Conversely, participants in the original SNPCC were given the 
spectroscopically confirmed sample S of 1,103 SNe and asked to 
predict the type of SNe in the simulated sample V of 19,792 SNe. 
As discussed in Sec. [2] the fact that S is not a representative train- 
ing sample makes classification more difficult than if one simply 
uses random training samples, on which most of our analysis has 
been focussed. Despite these caveats, for reference we note that the 
original challenge entry with the highest SNPCC figure of merit of 
~ 0.4 (averaged over SN redshift bins), achieved an overall com- 
pleteness of 0.96 and purity of 0.79, although the quality of the 
classification varied considerably with SN redshift. 

More recent works by Newling et al. I Hh JJ), iRichards et al.l 
d2012h and llshida & de Souzal (2012) analyse the same post- 
SNPCC data set D used in this paper. A meaningful comparison 
of their results with our own is still not straightforward, however, 
since all three studies make different choices for the nature and size 
of the subsets of T> used for training and testing, which also differ 
from the choices made in this paper. Nonetheless, some broad com- 
parisons are possible. 

The most straightforward comparison is with iNewling et all 

J201 JJ), who present results using, as we do, the subset S and also 
various random subsets of T> (which they call 'representative sam- 
ples') for training their classifier. For 5, their KDE method achieves 
a figure-of-merit of 0.37 (0.39) without (with) the inclusion of host- 
galaxy redshift information, whereas their boosting method yields a 
figure-of merit of 0.15 using redshift information. Turning to their 
analysis of representative samples, from figure 15 in their paper, 
one sees that their boosting method, which is the more success- 
ful of their two methods on the representative samples, achieves 
figures-of-merit of ~ 0.45 and ~ 0.55 for training sets containing 
~ 2000 and ~ 8000 SNe, respectively, which correspond roughly 
to the sizes of our T>\ and V4 training data sets. These classification 
results are obtained from the remaining SNe in T>, after including 
SNe host-galaxy redshift information, and are very similar to the 
equivalent figures-of-meri t achieved by our own classifier, as listed 
in Table [2] Unfortunately, INewling et aT] J201 ll) do not give values 
for their corresponding completeness and purity values, so it is not 
possible to compare their results with our own. 

Both IRichards et aT] J2012h and llshida & de Souzj J2012h 
adopt very different approaches from the above, and from each 
other, for choosing the nature and size of the subsets of T> used for 
trai ning. Indeed, each of t hese works considers a range of training 
sets. IRichards et al.l 12012J) do, however, consider training using the 
spectroscopically-confirmed sample 5, and obtain a purity of 0.50 
(0.54), a completeness of 0.50 (0.90) and a figure-of-merit of 0.13 
(0.25) without (with) the inclusion of redshift information. They 
also construct three further classes of training set, two of which 
contain several examples, each requiring the same fixed amount of 
spectroscopic follow-up time assumed in the construction of the 
original sample S. These are: (i) SNe observed in order of decreas- 
ing brightness; (ii) (r band) magnitude-limited surveys down to 
23.5, 24. 24.5 and 25th magnitude, respectively; and (iii) redshift- 
limited surveys out to z = 0.4 and 0.6, respectively. In applying 
their classifier to the post-SNPCC the remaining part of the photo- 
metric sample V, their best classification results are a completeness 
of 0.65 (0.74), a purity of 0.72 (0.76) and a figure-of-merit of 0.3 1 
(0.36) without (with) the inclusion of redshift information; these 
were obtained from the deepest magnitude-limited survey, which 
contained only 165 SNe. It is worth noting the deeper training sets 
have a SNe class composition that closely resembles that of the full 



data set X>, and so begin to approximate a representative training 
sample. 

The training sets considered by Ishida & d e Souzal fa)12l) are 
closer in spirit to the one used in the original SNPCC. Starting with 
the spectroscopically-confirmed subsample 5, as a requirement of 
their method they impose selection cuts such that every SN must 
have at least one observation epoch with t ^ ii ow and one with 
t t up in all available filters. In addition, each SN must have at 
least 3 observations above a given signal-to-noise ratio (SNR) in 
each filter. The same selection cuts are also applied to the photo- 
metric subsample V to produce the corresponding testing sets. Of 
the selection cuts considered, their sample D5 with SNR > is the 
least restrictive, yielding training and testing sets containing 830 
and 15,988 SNe, respectively. On these demanding data their clas- 
sifier yields a completeness of 0.44, a purity of 0.37, and a figure- 
of-merit of 0.06. 

From these various comparisons, we conclude that the method 
presented in this paper is indeed competitive, inspite of its rela- 
tive simplicity, and yields reasonably robust classifications. Finally, 
we note that, aside from its relative simplicity and robustness, the 
classification method we have presented can be extended and im- 
proved in a number of ways. In particular, its use of probabilistic 
classification allows one to calculate expected completeness, pu- 
rity and figure-of-merit (amongst other measures), without know- 
ing the true classes of the SNe in the testing sample, as will be the 
case in the classification of real SNe. This allows one to tailor the 
method by adjusting the output threshold probability p t h to achieve 
a given completeness, purity or figure-of-merit. Alternatively, one 
could use the expected ROC curve to optimise the value of p t h used 
for classification. We plan to investigate these possibilities in a fu- 
ture work. 
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